Spectral Analysis for Billion-Scale Graphs: Discoveries and Implementation
نویسندگان
چکیده
Given a graph with billions of nodes and edges, how can we find patterns and anomalies? Are there nodes that participate in too many or too few triangles? Are there close-knit near-cliques? These questions are expensive to answer unless we have the first several eigenvalues and eigenvectors of the graph adjacency matrix. However, eigensolvers suffer from subtle problems (e.g., convergence) for large sparse matrices, let alone for billion-scale ones. We address this problem with the proposed HEIGEN algorithm, which we carefully design to be accurate, efficient, and able to run on the highly scalable MAPREDUCE (HADOOP) environment. This enables HEIGEN to handle matrices more than 1000× larger than those which can be analyzed by existing algorithms. We implement HEIGEN and run it on the M45 cluster, one of the top 50 supercomputers in the world. We report important discoveries about near-cliques and triangles on several real-world graphs, including a snapshot of the Twitter social network (38Gb, 2 billion edges) and the “YahooWeb” dataset, one of the largest publicly available graphs (120Gb, 1.4 billion nodes, 6.6 billion edges).
منابع مشابه
A particle swarm optimization algorithm for minimization analysis of cost-sensitive attack graphs
To prevent an exploit, the security analyst must implement a suitable countermeasure. In this paper, we consider cost-sensitive attack graphs (CAGs) for network vulnerability analysis. In these attack graphs, a weight is assigned to each countermeasure to represent the cost of its implementation. There may be multiple countermeasures with different weights for preventing a single exploit. Also,...
متن کاملMining Tera-Scale Graphs: Theory, Engineering and Discoveries
How do we find patterns and anomalies, on graphs with billions of nodes and edges, which do not fit in memory? How to use parallelism for such Teraor Peta-scale graphs? In this thesis, we propose PEGASUS, a large scale graph mining system implemented on the top of the HADOOP platform, the open source version of MAPREDUCE. PEGASUS includes algorithms which help us spot patterns and anomalous beh...
متن کاملSIGNLESS LAPLACIAN SPECTRAL MOMENTS OF GRAPHS AND ORDERING SOME GRAPHS WITH RESPECT TO THEM
Let $G = (V, E)$ be a simple graph. Denote by $D(G)$ the diagonal matrix $diag(d_1,cdots,d_n)$, where $d_i$ is the degree of vertex $i$ and $A(G)$ the adjacency matrix of $G$. The signless Laplacianmatrix of $G$ is $Q(G) = D(G) + A(G)$ and the $k-$th signless Laplacian spectral moment of graph $G$ is defined as $T_k(G)=sum_{i=1}^{n}q_i^{k}$, $kgeqslant 0$, where $q_1$,$q_2$, $cdots$, $q_n$ ...
متن کاملNet-Ray: Visualizing and Mining Billion-Scale Graphs
How can we visualize billion-scale graphs? How to spot outliers in such graphs quickly? Visualizing graphs is the most direct way of understanding them; however, billion-scale graphs are very difficult to visualize since the amount of information overflows the resolution of a typical screen. In this paper we propose NET-RAY, an open-source package for visualizationbased mining on billion-scale ...
متن کاملEntropy Generation of Variable Viscosity and Thermal Radiation on Magneto Nanofluid Flow with Dusty Fluid
The present work illustrates the variable viscosity of dust nanofluid runs over a permeable stretched sheet with thermal radiation. The problem has been modelled mathematically introducing the mixed convective condition and magnetic effect. Additionally analysis of entropy generation and Bejan number provides the fine points of the flow. The of model equations are transformed into non-linear or...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011